87 research outputs found
Tools for efficient Deep Learning
In the era of Deep Learning (DL), there is a fast-growing demand for building and deploying Deep Neural Networks (DNNs) on various platforms. This thesis proposes five tools to address the challenges for designing DNNs that are efficient in time, in resources and in power consumption.
We first present Aegis and SPGC to address the challenges in improving the memory efficiency of DL training and inference. Aegis makes mixed precision training (MPT) stabler by layer-wise gradient scaling. Empirical experiments show that Aegis can improve MPT accuracy by at most 4\%. SPGC focuses on structured pruning: replacing standard convolution with group convolution (GConv) to avoid irregular sparsity. SPGC formulates GConv pruning as a channel permutation problem and proposes a novel heuristic polynomial-time algorithm. Common DNNs pruned by SPGC have maximally 1\% higher accuracy than prior work.
This thesis also addresses the challenges lying in the gap between DNN descriptions and executables by Polygeist for software and POLSCA for hardware. Many novel techniques, e.g. statement splitting and memory partitioning, are explored and used to expand polyhedral optimisation. Polygeist can speed up software execution in sequential and parallel by 2.53 and 9.47 times on Polybench/C. POLSCA achieves 1.5 times speedup over hardware designs directly generated from high-level synthesis on Polybench/C.
Moreover, this thesis presents Deacon, a framework that generates FPGA-based DNN accelerators of streaming architectures with advanced pipelining techniques to address the challenges from heterogeneous convolution and residual connections. Deacon provides fine-grained pipelining, graph-level optimisation, and heuristic exploration by graph colouring. Compared with prior designs, Deacon shows resource/power consumption efficiency improvement of 1.2x/3.5x for MobileNets and 1.0x/2.8x for SqueezeNets.
All these tools are open source, some of which have already gained public engagement. We believe they can make efficient deep learning applications easier to build and deploy.Open Acces
Hyperbolic Concentration, Anti-concentration, and Discrepancy
Chernoff bound is a fundamental tool in theoretical computer science. It has
been extensively used in randomized algorithm design and stochastic type
analysis. Discrepancy theory, which deals with finding a bi-coloring of a set
system such that the coloring of each set is balanced, has a huge number of
applications in approximation algorithms design. Chernoff bound [Che52] implies
that a random bi-coloring of any set system with sets and elements will
have discrepancy with high probability, while the famous
result by Spencer [Spe85] shows that there exists an discrepancy
solution.
The study of hyperbolic polynomials dates back to the early 20th century when
used to solve PDEs by G{\aa}rding [G{\aa}r59]. In recent years, more
applications are found in control theory, optimization, real algebraic
geometry, and so on. In particular, the breakthrough result by Marcus,
Spielman, and Srivastava [MSS15] uses the theory of hyperbolic polynomials to
prove the Kadison-Singer conjecture [KS59], which is closely related to
discrepancy theory.
In this paper, we present a list of new results for hyperbolic polynomials:
* We show two nearly optimal hyperbolic Chernoff bounds: one for Rademacher
sum of arbitrary vectors and another for random vectors in the hyperbolic cone.
* We show a hyperbolic anti-concentration bound.
* We generalize the hyperbolic Kadison-Singer theorem [Br\"a18] for vectors
in sub-isotropic position, and prove a hyperbolic Spencer theorem for any
constant hyperbolic rank vectors.
The classical matrix Chernoff and discrepancy results are based on
determinant polynomial. To the best of our knowledge, this paper is the first
work that shows either concentration or anti-concentration results for
hyperbolic polynomials. We hope our findings provide more insights into
hyperbolic and discrepancy theories
Collective modes of a collisional anisotropic quark-gluon plasma
In this paper we consider the collective modes of a momentum-space
anisotropic quark-gluon plasma taking into account the effect of collisions
between the plasma constituents. Our analysis is carried out using a
collisional kernel of Bhatnagar-Gross-Krook form and extends prior analyses in
the literature by considering all possible angles of propagation of the gluonic
modes relative to the momentum-anisotropy axis. We extract both the stable and
unstable modes as a function of the collision rate and confirm prior findings
that gluonic unstable modes can be eliminated from the spectrum if the
collision rate is sufficiently large. In addition, we discuss the conditions
necessary for the existence of unstable modes and present evidence that
unstable mode growth rates are maximal for modes with momentum along the
anisotropy direction. Finally, we demonstrate that when there is a finite
collisional rate, gluonic unstable modes are absent from the spectrum at both
small and large momentum anisotropy. These results pave the way for
understanding the impact of collisions on a variety of non-equilibrium
quark-gluon plasma observables.Comment: 19 pages and 15 figure
Symmetric Sparse Boolean Matrix Factorization and Applications
In this work, we study a variant of nonnegative matrix factorization where we
wish to find a symmetric factorization of a given input matrix into a sparse,
Boolean matrix. Formally speaking, given ,
we want to find such that is minimized among all for which
each row is -sparse. This question turns out to be closely related to a
number of questions like recovering a hypergraph from its line graph, as well
as reconstruction attacks for private neural network training.
As this problem is hard in the worst-case, we study a natural average-case
variant that arises in the context of these reconstruction attacks: for a random Boolean matrix with
-sparse rows, and the goal is to recover up to column
permutation. Equivalently, this can be thought of as recovering a uniformly
random -uniform hypergraph from its line graph.
Our main result is a polynomial-time algorithm for this problem based on
bootstrapping higher-order information about and then decomposing
an appropriate tensor. The key ingredient in our analysis, which may be of
independent interest, is to show that such a matrix has full
column rank with high probability as soon as , which
we do using tools from Littlewood-Offord theory and estimates for binary
Krawtchouk polynomials.Comment: 33 pages, to appear in Innovations in Theoretical Computer Science
(ITCS 2022), v2: updated ref
Efficient Algorithm for Solving Hyperbolic Programs
Hyperbolic polynomials is a class of real-roots polynomials that has wide
range of applications in theoretical computer science. Each hyperbolic
polynomial also induces a hyperbolic cone that is of particular interest in
optimization due to its generality, as by choosing the polynomial properly, one
can easily recover the classic optimization problems such as linear programming
and semidefinite programming. In this work, we develop efficient algorithms for
hyperbolic programming, the problem in each one wants to minimize a linear
objective, under a system of linear constraints and the solution must be in the
hyperbolic cone induced by the hyperbolic polynomial. Our algorithm is an
instance of interior point method (IPM) that, instead of following the central
path, it follows the central Swath, which is a generalization of central path.
To implement the IPM efficiently, we utilize a relaxation of the hyperbolic
program to a quadratic program, coupled with the first four moments of the
hyperbolic eigenvalues that are crucial to update the optimization direction.
We further show that, given an evaluation oracle of the polynomial, our
algorithm only requires oracle calls, where is the number
of variables and is the degree of the polynomial, with extra arithmetic operations, where is the number of constraints
- …